Intro to R

DAPT 617 ANALYTICS COMPUTING I

Welcome to Analytics Computing!

Let’s Jam

Post your thoughts…

https://jamboard.google.com/d/1LpJL4xq66UWPS0LVQ7gKJ2qKj5q-ivQetItGH26urls/edit?usp=sharing

Course Goal: Add a new tool to your data science toolkit

Why R?

  • It’s free!
  • R & Python are top skills for data science, engineering, machine learning; extremely advantageous for analysts
  • Data Visualization

  • Big community of R Users and contributed packages

Why not R?

Steep Learning Curve Ahead

Resources along the way

How will we get there?

Examples of R?

R can be a calculator

R can be used to do basic math…

1 + 2
[1] 3

Calculations follow PEMDAS order of operations: Parenthesis, Exponents, Multiplication, Division, Addition and Subtraction.

3 * 2 + 6
(3 * 2) + 6
3 * (2 + 6)
[1] 12
[1] 12
[1] 24

Variables

A variable can store any data type (e.g. numeric, character, date, logical) or object (e.g. functions, vectors, plots).

# assign variable using the assignment `<-` operator (preferred)
x <- 5
x
[1] 5
# assign variable using the `=` operator
y = 2
y
[1] 2
z <- x^y
[1] 25

Variables (continued)

Remove a variable from the environment with the function rm.

z
[1] 25
rm(z)
z
Error in eval(expr, envir, enclos): object 'z' not found

Vectors

A vector is a collection of elements of the same type. Operations can be applied to each element of the vector automatically.

my_vector <- c(1, 2, 3, 4, 5)
my_vector
[1] 1 2 3 4 5
my_vector * 2
[1]  2  4  6  8 10

Generate a sequence using the : operator.

my_sequence <- -5:15
 [1] -5 -4 -3 -2 -1  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15

Create a comparison on a vector.

my_sequence <= 10
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[13]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE

R vs. RStudio

RStudio

Exercise: Create an R Script

  • Open RStudio
  • Create an R script called my_script.R that:
    • Assigns a variable my_name with your name
    • Assigns a variable my_number with your lucky number
    • Assigns a variable my_vector with a vector of 5 numbers
    • Multiplies my_vector by my_number
    • Prints the contents of each variable

Install an R package

Install new packages: install.packages("PACKAGE_NAME")

Use package functions in environment: library(PACKAGE_NAME)

! Notice that installing packages requires the package name in quotes (” or ’), but calling the library function does not

See list of installed packages with installed.packages

Exercise: Install the swirl R package

swirl is a package with a great collection of interactive R courses.

Install swirl: install.packages("swirl")

Bring the package into your environment and install the “R Programming Course”:

library(swirl)

install_course("R Programming")

Exercise: Run the swirl R package

Type

swirl()

and select the first lesson, 1: Basic Building Blocks.

Knowledge Test

  • What symbol is used to assign a value to a variable?

  • What does c() do?

  • How can you quickly bring up help on a function?

  • What is the output of

values <- c(5, 10, 100)
div <- 5

result <- values / div
result
[1]  1  2 20

Data Types

4 main types: numeric, character, Date/POSIXct, and logical

Data Type Description Examples
numeric integers, decimals, positive, negative numbers 500, 3.4, -6, 0
character (or factor) text data; factor data types have “levels” “Hello world”, c(“agree”, “disagree”, “neutral”)

Data Types (continued)

Data Type Description Examples Helper Functions
dates date or POSIXct (date & time) “2019-01-25”, “June 20 2007”, “Fri Sep 16 21:07:56 2022” Sys.Date, date, as.Date, format, functions from the lubridate package
logical true or false (true = 1 and false = 0 in numeric form) 2 == 3, 6 != 5, 2 < 3 is.logical

Reveal the data type of any variable using the class function.

class(86)
[1] "numeric"
class("Hello DAPT class")
[1] "character"

Factor Vectors

Factors are vectors used to work with categorical variables and have a known and fixed set of values.

months <- c('Mar', 'Feb', 'Jan')
class(months)
[1] "character"

What does it look like if we sort this character vector?

sort(months)
[1] "Feb" "Jan" "Mar"
months <- as.factor(months)
levels(months)
[1] "Feb" "Jan" "Mar"

The levels are an attribute of factors that define all possible elements and can define the order.

levels(months) <- c('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec')
sort(months)
[1] Jan Feb Mar
Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Functions

Functions automate tasks and make code repeatable. So far, we’ve used several base R functions like class, as.factor, and sort.

Structure:

function_name(arguments)

where arguments pass the function the needed information in order to complete the function’s task. Note: not all functions need arguments (e.g. getwd() )

?function

Get help or documentation on a function using the ? operator.

Generative AI

https://copilot.microsoft.com/

Use VCU student credentials.

What’s next?

  • Data Exploration

  • Data Wrangling

Happy coding!